Using Index Structures for Anytime Stream Mining
نویسندگان
چکیده
Stream data mining has gained a lot of attention over the last years due to an abundance of streaming data in professional as well as personal applications. Solutions have been proposed for many mining tasks such as clustering, classification, frequent item set mining and aggregation. Stream mining is especially challenging due to the large (usually endless) amount of data and the time constraints posed by the stream’s arrival rate. We recently presented an indexbased solution for anytime stream classification that handles both large amounts of data and arbitrary arrival times. In this paper we present our ongoing work, wherein we investigate bulk loading strategies to improve the classification accuracy w.r.t. anytime constraints. We show promising results and discuss future challenges related to index-based classification on data streams. Furthermore we discuss extensions of our technique to other data mining tasks.
منابع مشابه
High-Speed Data Stream Mining using VFDT
Large databases that grow without limit at a rate of several million records per day and to mining these continuous data streams brings unique opportunities to the researchers. Here we describe and evaluate VFDT, an anytime system that builds decision trees using constant memory and constant time per example. VFDT can incorporate tens of thousands of examples per second. It uses Hoeffding bound...
متن کاملApplication of Data-Mining Algorithms in the Sensitivity Analysis and Zoning of Areas Prone to Gully Erosion in the Indicator Watersheds of Khorasan Razavi Province
Extended abstract 1- Introduction Gully erosion is one of the most important sources of sediment in the watersheds and a common phenomenon in semi-arid climate that affects vast areas with different morphological, soil and climatic conditions. This type of erosion is very dangerous due to the transfer of fertile soil horizons, and the reduction of water holding capacity also is a factor for s...
متن کاملIncrementally Optimized Decision Tree for Mining Imperfect Data Streams
The Very Fast Decision Tree (VFDT) is one of the most important classification algorithms for real-time data stream mining. However, imperfections in data streams, such as noise and imbalanced class distribution, do exist in real world applications and they jeopardize the performance of VFDT. Traditional sampling techniques and post-pruning may be impractical for a non-stopping data stream. To ...
متن کاملIdentification of Ti- anomaly in stream sediment geochemistry using of stepwise factor analysis and multifractal model in Delijan district, Iran
In this study, 115 samples taken from the stream sediments were analyzed for concentrations of As, Co, Cr, Cu, Ni, Pb, W, Zn, Au, Ba, Fe, Mn, Sr, Ti, U, V and Zr. In order to outline mineralization-derived stream sediments, various mapping techniques including fuzzy factor score, geochemical halos and fractal model were used. Based on these models, concentrations of Co, Cr, Ni, Zn, Ba, Fe, Mn, ...
متن کامل